Antipattern: Use Null as an Ordinary Value, or Vice Versa
Let's see what happens when we treat the null value as an ordinary value or vice versa.
Many software developers are caught off-guard by the behavior of NULL in SQL. Unlike most programming languages, SQL treats NULL as a unique value that is distinct from zero, false, and empty strings. This is true in standard SQL and most brands of databases. However, in Oracle and Sybase, NULL is precisely the same as a string of zero length. The NULL value follows some unique behavior, too.
Using NULL value in expressions#
One case that surprises some people is when we perform arithmetic on a column or expression that is NULL. For example, many programmers would expect the result to be 10 for bugs that haven’t been given an estimate in the hours
column, but instead, the query returns NULL
.
NULL is not the same as zero. A number that is ten greater than an unknown number is still an unknown number.
NULL is not the same as a string of zero length. Combining any string with NULL in standard SQL returns NULL (despite its behavior in Oracle and Sybase).
NULL is not the same as false. Boolean expressions also produce results that some people find confusing.
Searching nullable columns#
The following query returns only those rows in which assigned_to
has the value 123, and not rows with other values or in which the column is null:
It may seem that the following query returns the complement set of rows to the previous query, that is, all those rows that are not returned by the previous query. However, this query doesn’t actually return any rows at all. Let’s try it in the following playground:
However, neither query result includes rows where assigned_to
is NULL. Any comparison to null returns unknown
, not true
or false
. The negation of NULL is still NULL.
It’s common to make the following mistakes when searching for NULL or non-NULL values:
Now, we will retrieve the bugs assigned to non-NULL in the following widget:
The condition in a WHERE
clause is satisfied only when the expression is true, but a comparison to NULL is never true; it’s unknown. It doesn’t matter whether the comparison is for equality or inequality. Thus, neither of the previous queries returns rows where assigned_to
is NULL.
Using NULL value in query parameters#
It’s also difficult to use NULL in a parameterized SQL expression as if NULL were an ordinary value.
The previous query returns predictable results when we send an ordinary integer value for the parameter, but we can’t use a literal NULL as the parameter.
Avoiding the issue#
If handling NULL makes queries more complex, many software developers choose to disallow NULL altogether in the database. Instead, they choose an ordinary value to signify “unknown” or “inapplicable.”
We hate NULLs!
What exactly is wrong with this practice? In the following example, let’s declare the previously nullable columns assigned_to
and hours
as NOT NULL
:
Let’s say we use -1 to represent an unknown value.
The hours
column is numeric, so we’re restricted to a numeric value to mean “unspecified.” It has to have no meaning in that column, so we chose a negative value. But the value -1 would throw off calculations such as SUM()
or AVG()
. We have to exclude rows with this value, using special-case expressions, which is what we were trying to avoid by prohibiting null in the first place.
In another column, the value -1 might be significant, so we have to choose a different value on a case-by-case basis for each column. We also have to remember or document the special values used by each column. This adds a lot of arduous and unnecessary work to a project.
Now let’s look at the assigned_to
column. It is a foreign key to the Accounts
table. When a bug has been reported but not assigned yet, what non-NULL value can we use? Any non-NULL value must reference a row in Accounts
, so we need to create a placeholder row in Accounts
, meaning “no one“ or “unassigned.” It seems ironic to create an account to reference so that we can represent the absence of a reference to a real user’s account.
When we declare a column as NOT NULL
, it should be because it would make no sense for the row to exist without a value in that column. For example, the Bugs.reported_by
column must have a value because every bug was reported by someone. But a bug may exist without having been assigned yet. Missing values should be NULL.